{{Short description, Subfield of machine learning Deep reinforcement learning (DRL) is a subfield of

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

that combines principles of

reinforcement learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...

(RL) and

deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

. It involves training agents to make decisions by interacting with an environment to maximize cumulative rewards, while using

deep neural networks Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

to represent policies, value functions, or environment models. This integration enables DRL systems to process high-dimensional inputs, such as images or continuous control signals, making the approach effective for solving complex tasks. Since the introduction of the deep Q-network (DQN) in 2015, DRL has achieved significant successes across domains including

games A game is a Structure, structured type of play (activity), play usually undertaken for entertainment or fun, and sometimes used as an Educational game, educational tool. Many games are also considered to be Work (human activity), work (such as p ...

robotics Robotics is the interdisciplinary study and practice of the design, construction, operation, and use of robots. Within mechanical engineering, robotics is the design and construction of the physical structures of robots, while in computer s ...

, and autonomous systems, and is increasingly applied in areas such as healthcare, finance, and autonomous vehicles.

Deep reinforcement learning

Introduction

Deep reinforcement learning (DRL) is part of

, which combines

(RL) and

. In DRL, agents learn how decisions are to be made by interacting with environments in order to maximize cumulative rewards, while using

to represent policies, value functions, or models of the environment. This integration enables agents to handle high-dimensional input spaces, such as raw images or continuous control signals, making DRL a widely used approach for addressing complex tasks.Li, Yuxi. "Deep Reinforcement Learning: An Overview." ''arXiv'' preprint arXiv:1701.07274 (2018). https://arxiv.org/abs/1701.07274 Since the development of the deep Q-network (DQN) in 2015, DRL has led to major breakthroughs in domains such as

, and autonomous systems. Research in DRL continues to expand rapidly, with active work on challenges like sample efficiency and robustness, as well as innovations in model-based methods, transformer architectures, and open-ended learning. Applications now range from healthcare and finance to language systems and autonomous vehicles.Arulkumaran, Kai, et al. "A brief survey of deep reinforcement learning." ''arXiv'' preprint arXiv:1708.05866 (2017). https://arxiv.org/abs/1708.05866

Background

Reinforcement learning (RL) is a framework in which agents interact with environments by taking actions and learning from feedback in form of rewards or penalties. Traditional RL methods, such as

Q-learning ''Q''-learning is a reinforcement learning algorithm that trains an agent to assign values to its possible actions based on its current state, without requiring a model of the environment ( model-free). It can handle problems with stochastic tra ...

and policy gradient techniques, rely on tabular representations or linear approximations, which are often not scalable to high-dimensional or continuous input spaces. DRL came out as solution to above limitation by integrating RL and

. This combination enables agents to approximate complex functions and handle unstructured input data like raw images, sensor data, or natural language. The approach became widely recognized following the success of DeepMind's deep Q-network (DQN), which achieved human-level performance on several Atari video games using only pixel inputs and game scores as feedback. Since then, DRL has evolved to include various architectures and learning strategies, including model-based methods, actor-critic frameworks, and applications in continuous control environments. These developments have significantly expanded the applicability of DRL across domains where traditional RL was limited.

Key algorithms and methods

Several algorithmic approaches form the foundation of deep reinforcement learning, each with different strategies for learning optimal behavior. One of the earliest and most influential DRL algorithms is the Deep Q-Network (DQN), which combines Q-learning with deep neural networks. DQN approximates the optimal action-value function using a convolutional neural network and introduced techniques such as experience replay and target networks which stabilize training. * Policy gradient methods directly optimize the agent’s policy by adjusting parameters in the direction that increases expected rewards. These methods are well-suited to high-dimensional or continuous action spaces and form the basis of many modern DRL algorithms. * Actor-critic algorithms combine the advantages of value-based and policy-based methods. The actor updates the policy, while the critic evaluates the current policy using a value function. Popular variants include A2C (Advantage Actor-Critic) and PPO (Proximal Policy Optimization), both of which are widely used in benchmarks and real-world applications. Other methods include multi-agent reinforcement learning, hierarchical RL, and approaches that integrate planning or memory mechanisms, depending on the complexity of the task and environment. Reinforcement learning diagram

Applications

DRL has been applied to wide range of domains that require sequential decision-making and the ability to learn from high-dimensional input data. One of the most well-known applications is in

, where DRL agents have demonstrated performance comparable to or exceeding human-level benchmarks. DeepMind's AlphaGo and AlphaStar, as well as OpenAI Five, are notable examples of DRL systems mastering complex games such as Go,

StarCraft II ''StarCraft II'' is a real-time strategy video game created by Blizzard Entertainment, first released in 2010. A sequel to the successful '' StarCraft'', released in 1998, it is set in a militaristic far future. The narrative centers on a galacti ...

, and

Dota 2 ''Dota 2'' is a 2013 multiplayer online battle arena (MOBA) video game by Valve Corporation, Valve. The game is a sequel to ''Defense of the Ancients'' (''DotA''), a community-created Mod (video gaming), mod for Blizzard Entertainment's ''War ...

. While these systems have demonstrated high performance in constrained environments, their success often depends on extensive computational resources and may not generalize easily to tasks outside their training domains. In

, DRL has been used to train agents for tasks such as locomotion, manipulation, and navigation in both simulated and real-world environments. By learning directly from sensory input, DRL enables robots to adapt to complex dynamics without relying on hand-crafted control rules. Other growing areas of application include

finance Finance refers to monetary resources and to the study and Academic discipline, discipline of money, currency, assets and Liability (financial accounting), liabilities. As a subject of study, is a field of Business administration, Business Admin ...

(e.g., portfolio optimization),

healthcare Health care, or healthcare, is the improvement or maintenance of health via the preventive healthcare, prevention, diagnosis, therapy, treatment, wikt:amelioration, amelioration or cure of disease, illness, injury, and other disability, physic ...

(e.g., treatment planning and medical decision-making),

natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...

(e.g., dialogue systems), and

autonomous vehicles Vehicular automation is using technology to assist or replace the operator of a vehicle such as a car, truck, aircraft, rocket, military vehicle, or boat. Assisted vehicles are ''semi-autonomous'', whereas vehicles that can travel without a ...

(e.g., path planning and control).All of these applications shows how DRL deals with real-world problems like uncertainty, sequential reasoning, and high-dimensional data.

Challenges and limitations

DRL has several significant challenges which limit its broader deployment. One of the most prominent issues is sample inefficiency. DRL algorithms often require millions of interactions with the environment to learn effective policies, which is impractical in many real-world settings where data collection is expensive or time-consuming. Another challenge is sparse or delayed reward problem, where feedback signals are infrequent, which makes it difficult for agents to attribute outcomes to specific decisions. Techniques such as reward shaping and exploration strategies have been developed to address this issue. DRL systems also tend to be sensitive to hyperparameters and lack robustness across tasks or environments. Models that are trained in simulation fail very often when deployed in the real world due to discrepancies between simulated and real-world dynamics, a problem known as the "reality gap."Bias and fairness in DRL systems have also emerged as concerns, particularly in domains like healthcare and finance where imbalanced data can lead to unequal outcomes for underrepresented groups. Additionally, concerns about safety, interpretability, and reproducibility have become increasingly important, especially in high-stakes domains such as healthcare or autonomous driving. These issues remain active areas of research in the DRL community.

Recent advances

Recent developments in DRL have introduced new architectures and training strategies which aims to improving performance, efficiency, and generalization. One key area of progress is model-based reinforcement learning, where agents learn an internal model of the environment to simulate outcomes before acting. This kind off approach improves sample efficiency and planning. An example is the Dreamer algorithm, which learns a latent space model to train agents more efficiently in complex environments. Another major innovation is the use of transformer-based architectures in DRL. Unlike traditional models that rely on recurrent or convolutional networks, transformers can model long-term dependencies more effectively. The Decision Transformer and other similar models treat RL as a sequence modeling problem, enabling agents to generalize better across tasks. In addition, research into open-ended learning has led to the creation of capable agents that are able to solve a range of tasks without task-specific tuning. Similar systems like the ones that are developed by OpenAI show that agents trained in diverse, evolving environments can generalize across new challenges, moving toward more adaptive and flexible intelligence.

Future directions

As deep reinforcement learning continues to evolve, researchers are exploring ways to make algorithms more efficient, robust, and generalizable across a wide range of tasks. Improving sample efficiency through model-based learning, enhancing generalization with open-ended training environments, and integrating foundation models are among the current research goals. Similar area of interest is safe and ethical deployment, particularly in high-risk settings like healthcare, autonomous driving, and finance. Researchers are developing frameworks for safer exploration, interpretability, and better alignment with human values.Ensuring that DRL systems promote equitable outcomes remains an ongoing challenge, especially where historical data may under‑represent marginalized populations. The future of DRL may also involve more integration with other subfields of machine learning, such as unsupervised learning, transfer learning, and large language models, enabling agents that can learn from diverse data modalities and interact more naturally with human users.OpenAI et al. "Open-ended learning leads to generally capable agents." arXiv preprint arXiv:2302.06622 (2023). https://arxiv.org/abs/2302.06622

References

Wikipedia Student Program Reinforcement learning